

## TIMING PERFORMANCE ANALYSIS

FIELD OF THE INVENTION

**[0001]** The present invention relates generally to timing performance analysis, and more particularly to timing performance analysis for an integrated circuit comprising an embedded device.

BACKGROUND OF THE INVENTION

**[0002]** The process for producing an integrated circuit comprises many steps. Conventionally, a logic design is followed by a circuit design, which is followed by a layout design. With respect to the circuit design and layout portion, once circuits for an integrated circuit have been designed, such designs are converted to a physical representation known as a "circuit layout" or "layout." Layout is exceptionally important to developing a working design as it affects many aspects, including, but not limited to, signal noise, signal time delay, resistance, cell area, and parasitic effect.

**[0003]** Once a circuit is designed and laid out, it is often simulated to ensure performance criteria are met, including, but not limited to, signal timing. This type of analysis is difficult at the outset, and is made more difficult by an embedded design. An embedded design or embedded circuit is conventionally designed separately from an integrated circuit in which it is embedded. Sometimes this embedded circuit is referred to an intellectual property (IP) core or embedded core. This is because the information to build and test such an embedded circuit is provided from one company to another.

**[0004]** An IP core may have a certain maximum timing performance for input and output. For example, a

EV000377582US

microprocessor will have certain maximum timing performance for input and output of data and other information to a memory, or more particularly, a memory controller. In personal computer manufacture, operation of memory, or more particular memory modules, is specified for a bus "speed," such as 33 MHz, 66 MHz, and so on. Presently, the Rambus Signaling Level road map is for a memory to processor bus frequency of 1.2 GHz. However, processors presently operate at speeds in excess of 1.2 GHz, and thus processors must be slowed down for communicating with memory. Moreover, memory is speed graded, and conventionally slower memory costs less than faster memory.

**[0005]** However, there is not de facto standard bus interface for an embedded microprocessor. Accordingly, glue or gasket logic and/or interconnects are used to couple an embedded microprocessor to a host device, such as a programmable logic device. Programmable logic devices exist as a well-known type of integrated circuits that may be programmed by a user to perform specified logic functions. There are different types of programmable logic devices, such as programmable logic arrays (PLAs) and complex programmable logic devices (CPLDs). One type of programmable logic devices, called a field programmable gate array (FPGA), is very popular because of a superior combination of capacity, flexibility and cost.

**[0006]** Accordingly, it would be desirable and useful to provide method and apparatus for timing performance analysis for an embedded device.

#### SUMMARY OF THE INVENTION

**[0007]** An aspect of the present invention is a method for performing a timing analysis for a core device to be embedded in a host integrated circuit. Clock-to-output timing information is obtained for the core device. Setup and hold timing information and delay timing information is determined

for a portion of the host integrated circuit. The clock-to-output timing information, the setup and hold timing information and the delay timing information is associated with respective signals, and a path time delay for each of the respective signals is calculated.

**[0008]** An aspect of the present invention is a method for performing a timing analysis for a core device in a host integrated circuit. Setup and hold timing information is obtained for the core device. Clock-to-output timing information and delay timing information is determined for a portion of the host integrated circuit. The clock-to-output timing information, the setup and hold timing information and the delay timing information is associated with respective signals, and a path time delay for each of the respective signals is calculated.

**[0009]** An aspect of the present invention is a method for determining timing performance. Clock-to-output times for a processor core are obtained. Static timing analysis is used to determine timing data for a memory controller. Setup and hold times are obtained from the timing data for the memory controller. A programmatic representation of logic and interconnects for coupling the memory controller and the processor core is provided. The programmatic representation of logic and interconnects are simulated to obtain delay times. The delay times, the setup and hold times and the clock-to-output times are used as inputs to a spreadsheet, and path times are determined from the spreadsheet.

**[0010]** An aspect of the present invention is a method for determining timing performance. Setup and hold times for a processor core are obtained. Static timing analysis is used to determine timing data for a memory controller. Clock-to-output times are obtained from the timing data for the memory controller. A programmatic representation of logic and interconnects for coupling the memory controller and the processor core is provided. The programmatic representation

of logic and interconnects is simulated to obtain delay times. The delay times, the setup and hold times and the clock-to-output times are provided as input to a spreadsheet, and path times are determined from the spreadsheet.

BRIEF DESCRIPTION OF THE DRAWINGS

**[0011]** So that the manner in which the above recited features, advantages and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.

**[0012]** It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the present invention may admit to other equally effective embodiments.

**[0013]** FIG. 1 is a block diagram of an exemplary portion of an embodiment of an integrated circuit in accordance with one or more aspects of the present invention.

**[0014]** FIG. 2 is a timing diagram for the integrated circuit portion of FIG. 1.

**[0015]** FIGS. 3 and 4 are flow diagrams of respective exemplary embodiment of timing performance analysis processes for output and input paths, respectively, for the integrated circuit of FIG. 1 in accordance with one or more aspects of the present invention.

DETAILED DESCRIPTION OF THE DRAWINGS

**[0016]** In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the present invention may be practiced without one or more of these specific details. In other instances, well-known features

have not been described in order to avoid obscuring the present invention.

**[0017]** Referring to FIG. 1, there is shown a block diagram of an exemplary portion of an embodiment of an integrated circuit 100. Integrated circuit 100 comprises an embedded core 110, such as an embedded microprocessor core, on-chip memory controller 120 (OCM), and gasket or glue logic ("G-logic") and interconnects 115A and 115B. Integrated circuit 100 may be a programmable logic device, such as an FPGA. Accordingly, OCM 120 may be programmed from FPGA circuit fabric, or may be a dedicated memory controller circuit, or a combination thereof. Furthermore, FPGAs conventionally comprise memory and a memory controller, and thus such a memory controller may be used to form at least a portion of OCM 120. Integrated circuit 100 may be formed after a timing analysis in accordance with one or more aspects of the present invention.

**[0018]** There are two signal paths to and from embedded core 110, namely input path 113 and output path 114. Clock signal 109 is provided to embedded core 110 and OCM 120. Each signal path 113, 114 represents provisioning of data, control and address information to and from embedded core 110. Accordingly, each signal path 113, 114 represents more than one signal path. Notably, a maximum time allowed without down grading for speed is:

$$T_{\text{path}} = 1/f_{\text{min}} \quad (1)$$

where  $f_{\text{min}}$  represents a minimum acceptable operating frequency for a system. Notably,  $f_{\text{min}}$  may be set equivalent to a maximum operating frequency for transferring information to and from embedded core 110. Thus, embedded core 110 is used to determine at least an initial value of  $f_{\text{min}}$ . Because there is more than one signal, each signal will have an associated

$T_{path}$ . However, though each  $T_{path}$  delay may be the same for two or more signals it may also be different, depending on routing circuitry and the like. Thus,  $T_{path}$  is evaluated for each signal to determine  $T_{path}$  for a system. However, as the analysis for each signal is the same, only one input and one output signal is described in the exemplary timing diagram of FIG. 2 for purposes of clarity.

**[0019]** Referring to FIG. 2, there is shown a timing diagram integrated circuit 100 of FIG. 1. With continuing reference to FIG. 2 and renewed reference to FIG. 1, input path 113 comprises delays that go into determining  $T_{path}$ . OCM output signal 131 has a clock-to-output delay (C-O) 134. This means that from a first triggering edge 109-1 of clock signal 109 to OCM 120, a signal 131 to be outputted from OCM 120 is delayed by an amount of a clock-to-output delay 134 before it is outputted from OCM 120, as indicated by transition 131-1. Another delay in determining is that caused by routing OCM output signal 131 through any G-logic and interconnect 115-B present on input path 113 with respect to communicating such signal. Accordingly, G-logic and interconnect (GL&I) output signal 132 is delayed by GL&I delay 135 with respect to OCM output signal 131, as indicated by transition 132-1. Notably, OCM output signal 131 and an GL&I output signal 132 may be the same signal, in which case transition 131-1 is just transition 132-1 further delayed.

**[0020]** Additionally, embedded core 110 comprises one or more setup and hold times. So, a setup and hold time for an incoming signal to embedded core 110 must be met before a next triggering edge 109-2 of clock signal 109. Embedded core input signal 133 is equivalent to GL&I output signal 132, as indicated by each having transition 132-1. However, embedded core input signal 133 is used in FIG. 2 to clearly delineate setup and hold time (Setup Time) 136 as measured from transition 132-1 of core input signal 133 to triggering edge 109-2 of clock signal 109.

**[0021]** It should be understood that for this embodiment  $T_{path}$  is to be less than one period of clock signal 109 to ensure integrated circuit 100 may operate at  $f_{min}$  of embedded core 110. In other words, it may be a goal to have clock signal 109 with a frequency of  $f_{min}$ . This is important because embedded core 110 may be provided as a "hard macro," namely, a fixed layout formed with a set minimum lithographic feature size. In other words, if embedded core 110 may not be changed, then operating at an optimum frequency of embedded core 110 is a set target operating speed. Notably, though the embodiment described herein is a single data rate, the present invention may be used with double data rate timing.

**[0022]** Output path 114 has delays similar to those of input path 113. Accordingly, embedded core 110 provides core output signal 121 delayed by a clock-to-output delay 124 as measured from a triggering edge 109-1 of clock signal 109 to transition 121-1 of output signal 121. Measured from transition 121-1 to transition 122-1 is GL&I delay 125 of GL&I output signal 122 due to G-logic and/or interconnect 115-A. OCM input signal 123 is equivalent to GL&I output signal 122, as indicated by each having transition 122-1. However, OCM input signal 123 is used in FIG. 2 to clearly delineate setup time 126 as measured from transition 122-1 of OCM input signal 123 to triggering edge 109-2 of clock signal 109.

**[0023]** Conventionally, embedded core 110 is provided with performance data including setup and hold times and clock-to-output times. These times may be provided in a known format, such as Standard Delay Format (SDF). Based on the assumption that setup and hold times and clock-to-output times are provided or determined, such as from simulation or testing prior to embedding, for embedded core 110, flow diagrams of FIGS. 3 and 4 are described.

**[0024]** Referring to FIG. 3, there is shown a flow diagram

of an exemplary embodiment of a timing performance analysis process 300 in accordance with one or more aspects of the present invention. With continuing reference to FIG. 4 and renewed reference to FIG. 1, timing performance analysis process 300 is described. Timing performance analysis process 300 is for output path 114.

**[0025]** At step 301, clock-to-output times are obtained for embedded core 110. At step 302, a static timing analysis is done on OCM 120. This static timing analysis is done by simulation at a transistor-level, and may be done with a product called PathMill from Synopsis of Mountain View, California. At step 303, data from step 302 is used to determine respective setup and hold times for signals to be inputted to OCM 120.

**[0026]** At step 304, a programmatic representation of gasket logic and interconnects 115-A is provided. Such a representation may be done in Verilog or VHDL, for example. At step 305, this programmatic representation is taken down from a logic level to something closer to a physical or transistor level, as such with HSpice or like program simulation, and simulated to get delays associated with signals passing through gasket logic and interconnects 115-A.

**[0027]** At step 306, outputs from steps 303 and 305, namely, setup and hold times for OCM 120 and signal delays for gasket logic and interconnects 115-A, respectively, are associated with clock-to-output times for embedded core 110 from step 301. This association may be done using a spreadsheet, a database and the like. For example, assuming data\_out\_1 from embedded core 110 is under consideration, then a spreadsheet association may look something like that shown in Table I,

Table I

| Signal | C-O | GL&I Delay | Setup/Hold | Total Time |
|--------|-----|------------|------------|------------|
| DO1    | 100 | 50         | 25         | 175        |

where all values are expressed in units of time, such as picoseconds for example.

**[0028]** At step 307, critical paths are identified by totaling C-0 delay, GL&I delay and Setup/Hold time to provide a total time for each signal traveling along output path 114. Accordingly, a total time,  $T_i$ , is determined for each signal on output path 114 going from embedded core 110 to OCM 120.

**[0029]** At step 308,  $T_i$  is compared to  $T_{path}$ . For example, it may be determined whether  $T_{path}$  is greater than or equal to  $T_i$  for each signal. Notably, it should be appreciated that if  $T_i$  was equal to  $T_{path}$ , then there would be "critical" timing. Accordingly, at step 308, such a check may be for  $T_{path}$  greater than  $T_i$  to avoid critical timing. Moreover, to ensure a margin of error,  $T'_{path}$ , which is approximately 1 to 10 percent, for example, less than  $T_{path}$ , may be used at step 308 for comparison with  $T_i$ . For purposes of clarity, the remainder of FIG. 3 is described as though  $T_i$  must be less than or equal to  $T_{path}$ , though it should be understood that other comparisons may be used.

**[0030]** Alternatively, timing performance analysis process 300 may end at step 307. This is because a largest value of times  $T_i$  may be determined, and frequency of operation of output path 114 of embedded core 110 may be set from there.

**[0031]** However, assuming either or both OCM 120 and gasket logic and interconnects 115-A may be modified, if any  $T_i$  is greater than  $T_{path}$ , then at step 310 circuitry from either or both OCM 120 and gasket logic and interconnects 115-A is modified to reduce time associated with identified critical paths, namely, signal paths producing  $T_i$ 's greater than  $T_{path}$ . In response to modification of circuitry at step 310, layout for such modified circuitry is made at step 311 and circuitry values associated therewith including, but not limited to, resistance, capacitance, among others both actual and parasitic, are extracted. Modified circuitry and associated

circuitry values are fed back at steps 302 and 304, as applicable. For example, if no change results in OCM 120 to modification to gasket logic and interconnect 115-A, then there is nothing to feedback, and vice versa with respect to change to OCM 120 resulting in no change to gasket logic and interconnect 115-A. Of course, modification may be made to both OCM 120 and gasket logic and interconnect 115-A resulting in feedback for both.

**[0032]** Timing performance analysis process 300 may continue, until at step 308 each  $T_i$  is less than or equal to  $T_{path}$ , in which event timing performance analysis process 300 ends at step 309. Notably, timing performance analysis process 300 works with embedded core 110 formed with a lithography of a first minimum dimension and OCM 120/gasket logic and interconnects 115-A formed with a lithography of a second minimum dimension different than the first minimum dimension. So, for example, embedded core 110 may be formed using .13 micron lithography and OCM 120/gasket logic and interconnects 115-A may be formed using .18 micron lithography.

**[0033]** Referring to FIG. 4, there is shown a flow diagram of an exemplary embodiment of a timing performance analysis process 400 in accordance with one or more aspects of the present invention. With continuing reference to FIG. 4 and renewed reference to FIG. 1, timing performance analysis process 400 is described. Timing performance analysis process 400 is for input path 113.

**[0034]** At step 401, setup and hold times are obtained for embedded core 110. At step 402, a static timing analysis is done on OCM 120. This static timing analysis is done by simulation at a transistor-level, and may be done with a product called PathMill from Synopsis of Mountain View, California. At step 403, data from step 402 is used to determine respective clock-to-output times for signals to be outputted from OCM 120.

**[0035]** At step 404, a programmatic representation of gasket logic and interconnects 115-B is provided. Such a representation may be done in Verilog or VHDL, for example. At step 405, this programmatic representation is taken down from a logic level to something closer to a physical or transistor level, as such with HSpice or like program simulation, and simulated to get delays associated with signals passing through gasket logic and interconnects 115-B.

**[0036]** At step 406, outputs from steps 403 and 405, namely, clock-to-output times for OCM 120 and signal delays for gasket logic and interconnects 115-B, respectively, are associated with setup and hold times for embedded core 110 from step 401. This association may be done using a spreadsheet, a database and the like. For example, assuming data\_in\_1 to embedded core 110 is under consideration, then a spreadsheet association may look something like that shown in Table II,

Table II

| Signal | C-O | GL&I Delay | Setup/Hold | Total Time |
|--------|-----|------------|------------|------------|
| DI1    | 150 | 50         | 25         | 225        |

where all values are expressed in units of time, such as picoseconds for example.

**[0037]** At step 407, critical paths are identified by totaling C-O delay, GL&I delay and Setup/Hold time to provide a total time for each signal traveling along input path 113. Accordingly, a total time,  $T_i$ , is determined for each signal on input path 113 going from OCM 120 to embedded core 110.

**[0038]** At step 408,  $T_i$  is compared to  $T_{path}$ . For example, it may be determined whether  $T_{path}$  is greater than or equal to  $T_i$  for each signal. Notably, it should be appreciated that if  $T_i$  was equal to  $T_{path}$ , then there would be "critical" timing. Accordingly, at step 408, such a check may be for  $T_{path}$  greater than  $T_i$  to avoid critical timing. Moreover, to

ensure a margin of error,  $T'_{path}$ , which is approximately 1 to 10 percent, for example, less than  $T_{path}$ , may be used at step 408 for comparison with  $T_i$ . For purposes of clarity, the remainder of FIG. 4 is described as though  $T_i$  must be less than or equal to  $T_{path}$ , though it should be understood that other comparisons may be used.

**[0039]** Alternatively, timing performance analysis process 400 may end at step 407. This is because a largest value of times  $T_i$  may be determined, and frequency of operation of input path 113 of embedded core 110 may be set from there.

**[0040]** However, assuming either or both OCM 120 and gasket logic and interconnects 115-B may be modified, if any  $T_i$  is greater than  $T_{path}$ , then at step 410 circuitry from either or both OCM 120 and gasket logic and interconnects 115-B is modified to reduce time associated with identified critical paths, namely, signal paths producing  $T_i$ 's greater than  $T_{path}$ . In response to modification of circuitry at step 410, layout for such modified circuitry is made at step 411 and circuitry values associated therewith including, but not limited to, resistance, capacitance, and inductance, among others both actual and parasitic, are extracted. Modified circuitry and associated circuitry values are fed back at steps 402 and 404, as applicable. For example, if no change results in OCM 120 to modification to gasket logic and interconnect 115-B, then there is nothing to feedback, and vice versa with respect to change to OCM 120 resulting in no change to gasket logic and interconnect 115-B. Of course, modification may be made to both OCM 120 and gasket logic and interconnect 115-B resulting in feedback for both.

**[0041]** Timing performance analysis process 400 may continue, until at step 408 each  $T_i$  is less than or equal to  $T_{path}$ , in which event timing performance analysis process 400 ends at step 409. Notably, timing performance analysis process 400 works with embedded core 110 formed with a

lithography of a first minimum dimension and OCM 120/gasket logic and interconnects 115-B form with a lithography of a second minimum dimension different than the first minimum dimension. So, for example, embedded core 110 may be formed using .13 micron lithography and OCM 120/gasket logic and interconnects 115-B may be formed using .18 micron lithography.

**[0042]** While foregoing is directed to the preferred embodiment of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. For example, though the present invention is described in terms of an FPGA and embedded processor core, it should be understood that constructs other than an FPGA and an embedded processor core may be used, including, but not limited to, combinations formed of a programmable logic device and at least one of a memory, an Application Specific Integrated Circuit, an Application Specific Standard Product, a Digital Signal Processor, a microprocessor, a microcontroller, and the like.

**[0043]** All trademarks are the respective property of their owners.